Gene finding by integrating gene finders

نویسندگان

  • Yudong Cai
  • Zhisong He
  • Lele Hu
  • Bing Li
  • Yi Zhou
  • Han Xiao
  • Zhiwen Wang
  • Kairui Feng
  • Lin Lu
  • Kaiyan Feng
  • Haipeng Li
چکیده

Gene finding, the accurate annotation of genomic DNA, has become one of the central topics in biological research. Although various computational methods (gene finders) have been proposed and developed, they all have their own limitations in gene findings. In this paper, we introduce an integrating gene finder, which combines the results of several existing gene finders together, to improve the accuracy of gene finding. Four integration schemes, based on majority voting, are developed for the analysis of two datasets – the basic dataset and the testing dataset. The basic dataset consists of 1500 DNA sequences and the testing dataset consists of 103 DNA sequences. It is demonstrated that a simple integration (a simple voting for each nucleotide) can significantly improve the finding performance, and removing confusing gene finders, caused by poor performance or redundant results, is important for a further improvement of the integration. The best prediction results are obtained using weighted majority voting, aided by the mRMR (Minimum Redundancy Maximum Relevance) (Peng, 2005) method for the gene finder selection. The prediction accuracies are 84.16% and 90.06% for the basic dataset and testing dataset respectively, which are better than any individual gene finding software in our research.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Discovery of transcription factor binding sites through integration of generic motif finders

Locating transcription factor binding sites is a key step in understanding gene regulation. Due to its importance, many de novo motif finding methods (e.g. MEME, MotifSampler, Mitra and Weeder) have been proposed. Individually, these motif finders perform unimpressively overall based on Tompa’s benchmark datasets. Moreover, these motif finders vary in their definitions of what constitute a moti...

متن کامل

DIGIT: A Novel Gene Finding Program by Combining Gene-Finders

We have developed a general purpose algorithm which finds genes by combining plural existing gene-finders. The algorithm has been implemented into a novel gene-finder named DIGIT. An outline of the algorithm is as follows. First, existing gene-finders are applied to an uncharacterized genomic sequence (input sequence). Next, DIGIT produces all possible exons from the results of gene-finders, an...

متن کامل

Evaluating High-Throughput Ab Initio Gene Finders to Discover Proteins Encoded in Eukaryotic Pathogen Genomes Missed by Laboratory Techniques

Next generation sequencing technology is advancing genome sequencing at an unprecedented level. By unravelling the code within a pathogen's genome, every possible protein (prior to post-translational modifications) can theoretically be discovered, irrespective of life cycle stages and environmental stimuli. Now more than ever there is a great need for high-throughput ab initio gene finding. Ab ...

متن کامل

Genomix: a method for combining gene-finders' predictions, which uses evolutionary conservation of sequence and intron-exon structure

MOTIVATION Correct gene predictions are crucial for most analyses of genomes. However, in the absence of transcript data, gene prediction is still challenging. One way to improve gene-finding accuracy in such genomes is to combine the exons predicted by several gene-finders, so that gene-finders that make uncorrelated errors can correct each other. RESULTS We present a method for combining ge...

متن کامل

Development of joint application strategies for two microbial gene finders

MOTIVATION As a starting point in annotation of bacterial genomes, gene finding programs are used for the prediction of functional elements in the DNA sequence. Due to the faster pace and increasing number of genome projects currently underway, it is becoming especially important to have performant methods for this task. RESULTS This study describes the development of joint application strate...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010